Functional motifs in Escherichia coli NC101.

Escherichia coli (E. coli) bacteria can damage DNA of the gut lining cells and may encourage the development of colon cancer according to recent reports. Genetic switches are specific sequence motifs and many of them are drug targets. It is interesting to know motifs and their location in sequences. At the present study, Gibbs sampler algorithm was used in order to predict and find functional motifs in E. coli NC101 contig 1. The whole genomic sequence of Escherichia coli NC101 contig 1 were retrieved from http://www.ncbi.nlm.nih.gov (NCBI Reference sequence: NZ_AEFA01000001.1) in order to be analyzed with DAMBE software and BLAST. The results showed that the 6-mer motif is CUGGAA in most sequences (genes1-3, 8, 9, 12, 14-18, 20-23, 25, 27, 29, 31-34), CUUGUA for gene 4 , CUGUAA for gene 5, CUGAUG for gene 6, CUGAUA for gene7, CUGAAA for genes 10, 11, 13, 26, 28, and CUGGAG for gene 19, and CUGGUA for gene30 in E. coli NC101 contig 1. It is concluded that the 6-mer motif is CUGGAA in most sequences in E. coli NC101 contig1. The present study may help experimental studies on elucidating the pharmacological and phylogenic functions of the motifs in E. coli.

Stanislaw Ulam and developed by nuclear weapon projects in USA (6). Gibbs sampler has been used to identify functional motifs in proteins (7), multiple sequence alignment (8), and biological image processing (9). The main element of a Gibbs sampler is position weight matrix or PWM. The PWM scores or PWMS has been reported as a scale of the motif strength (5). Escherichia coli are anaerobic bacteria and the most common population of bacteria in the intestinal flora of human. E. coli can make colony in the intestine few days after birth and permanently during human life.
Strains of E. coli can be categorized into four main groups (A, B1, B2, and D) and B2 group can persist in the colon longer than the others (10). It was reported that E. coli strains of B2 phylotype (e.g. E. coli NC101), carry a genomic pks island (a gene cluster coding nonribosomal peptide synthetases or NRPS and polyketide synthetases or PKS), produce Colibactin (a peptide-polyketide genotoxin) that can induce damage of DNA by double-strand breaks (DSBs) (11) and may develop colon cancer (12). In the present study, Gibbs sampler was used to identify functional motifs by BLAST and DAMBE software in order to distinguish motifs in E. coli strain NC101 contig1.

Materials and Methods
This investigation was started in the spring of 2013 and the data analysis was performed at bioinformatics facility of Faculty of Science Where i=1, 2, 3 and 4 corresponding to A, C, G and U, respectively, and j is site index, and p i is the background frequency of nucleotide i, and p ij is the site specific nucleotide frequency for nucleotide i at site j. The PWMS for a particular motif is computed as (2) where L is length of the motif (5).

Results
Gibbs sampler was employed to find functional motifs by DAMBE in order to identify genetic motifs with Gibbs sampler in E. coli NC101 contig1. Figure 1 shows shared motif in an aligned format in red color which is CUGGAA in most sequences (Fig. 1b). The main Gibbs sampler output is the sequences with aligned motifs as shown in Figure 1b and a site-specific frequency matrix (position weight matrix) presented in Table   1d respectively. showed that the 6-mer motif is CUGGAA. The sitespecific frequencies and PWM were shown in Table   1c and d in order to find and monitor other sequences for the presence of such motifs. The last part of the results (Table 2) shows the motifs start point. As shown again in Table 2, the 6-mer motif is CUGGAA in most sequences. Figure 2 shows the scatter diagram of S1D and S2D with E. coli NC101 contig1 sequences length.

Discussion
The finding of motifs in DNA sequences is a central problem in computational molecular biology, and through many computational methods, Gibbs sampling algorithm is a great promise which is used for finding functional motifs in the coexpressed genes (13). Motif finding is becoming an important toolbox for microbiologists likewise other DNA and protein computational molecular The sequences of E. coli NC101 contig1. The above panel represents the data input in Gibbs sampler (a). The below part represents the output of the motifs (i.e.,CUGGAA; in red color) through the sequences (b). S1-S34 correspond to sequence 1 to sequence 34.
PWMS is the log-odds ratio, and the strongest motif has the highest PWMS or odds-ratio (5 Our results may help the mentioned scenario by finding and discovering the functional genetic switches and motifs in E. coli NC101 contig1. The branch point sequence could be placed anywhere, however, it is preferable to be near the 3' rather than the 5' site. Surely, experiments causing step by step mutation on each nucleotide of the sequence between the donor and the acceptor site, could be performed, but this is very tedious and difficult. Therefore, one can apply and run the Gibbs sampler in order to find all the BPSs. The BPS cuts the E. coli NC101 contig1 sequences into two sections: the upstream part stretching from the 5' site to BPS (the S1 sequence), and the downstream sequence from BPS to the 3' site (the S2 sequence). The lengths of S1 and S2 sequences are named as S1 and S2 distances (S1D and S2D). If BPS is limited to be near the 3' site, thus the S2 distance is smaller than the S1 distance and vice versa (5).
Scatter diagram of S1D and S2D of E. coli NC101 contig1 sequences is shown in Figure 2.
The results showed that most of the S2D were higher than S1D (650.11±157.24 and 270.61±37.17 respectively).
The present study may help experimental studies on elucidating the pharmacological and phylogenic functions of the motifs in E. coli.